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Abstract 

We consider the weak convergence of numerical methods for stochastic differential equations (SDEs). Weak 
convergence is usually expressed in terms of the convergence of expected values of test functions of the trajectories. 
Here we present an alternative formulation of weak convergence in terms of the well-known Prokhorov metric on 
spaces of random variables. For a general class of methods, we establish bounds on the rates of convergence in 
terms of the Prokhorov metric. In doing so, we revisit the original proofs of weak convergence and show explicitly 
how the bounds on the error depend on the smoothness of the test functions. As an application of our result, we use 
the Stras sen-Dudley theorem to show that the numerical approximation and the true solution to the system of SDEs 
can be re-embedded in a probability space in such a way that the method converges there in a strong sense. One 
corollary of this last result is that the method converges in the Wasserstein distance, another metric on spaces of 
random variables. Another corollary establishes rates of convergence for expected values of test functions assuming 
only local Lipschitz continuity. We conclude with a review of the existing results for pathwise convergence of 
weakly converging methods and the corresponding strong results available under re-embedding. 

1 Introduction 

Consider the following system of Ito stochastic differential equations (SDEs) 

q 

dX = a(X)dt + Y,Or(X)dWr(t), X(0) = xq, (1.1) 

r=l 

for X(t) G M", where the W r (t) are independent standard Wiener processes. The simplest numerical method for 
obtaining approximate solutions to this system is the Euler-Maruyama method: for k > 0, timestep At, and A k W r = 
W r ((k+l)At)-Wr(kAt), 

X k+ i =X k + a{X k )At + £ a r {X k )A k W r , X = x . (1.2) 

r=\ 

For each k, X k is an approximation to X(kAt). The Euler-Maruyama method converges in the strong sense because 
for each realization of the W r (t), the method gives an approximation to the exact solution of the SDE with that same 
realization. In particular, as shown in Jgl p. 342], 

(E(X(T)-X r/Aj ) 2 ) 1 /2< C A/ 1 /2 ) (1.3) 

under certain assumptions on the coefficients a and a. In order for such a result to be possible X(T) and Xj/^ must 
be defined on the same probability space. 

Another way to quantify convergence of a numerical method is to consider the distribution of the random variable 
generated by the numerical method and see how close it is to the distribution of the true trajectory at the corresponding 
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point in time. This concept is known as weak convergence. The typical way to quantify this is through test functions. 
For example, for sufficiently smooth functions / the Euler-Maruyama method ( |1.2t satisfies 

\Ef(X T/At )-Ef(X(T))\<C f At (1.4) 

for some constant C/ depending on /. See |0, p. 473], or lfl4Tl for an important earlier reference. 

Strong convergence of a method implies weak convergence, but the converse is not true. For example, let N rk be 
independent identically distributed random variables with N rk = ±1 with probability 1 /2, and set 

q 

X k+1 =X k + a(X k )At+Y, Or{X k )At x l 2 N rk , (1.5) 

r=\ 

We can define the N rk in various ways. One possibility is for them to be independent of the Wiener processes driving 
the SDE ( II. lb ; another is to choose them to be N rk = sgn(A k W k ). In either case, this method, which we call weak 
Euler-Maruyama, does not converge strongly to the solution of the SDE. However, jlAj still holds whatever the 
relation between the N rk and the Wiener processes, even if they are defined on different probability spaces. 

In this paper, we present a different formulation of weak convergence of numerical method in terms of the Prokhorov 
metric ||2|]. For any two random elements of R" the Prokhorov metric gives a quantitative measurement of how far 
apart their distributions are. Its importance in probability theory [3] is that convergence in the Prokhorov metric 
is equivalent to convergence in distribution (or weak convergence, as it is sometimes known). Before we discuss 
our results we first review some of the definitions and facts of convergence in distribution in metric spaces. See 
Billingsley's book ||2|] for details. 

Consider a metric space S with metric d, such as R" with the Euclidean metric. We say that a sequence of random 
elements X„ in S converges in distribution to X in S if for all bounded continuous / : S — > R 

Ef(X n )^Ef(X), (1.6) 

as n — > °°. An equivalent definition of convergence in distribution of X n to X is that for all Borel sets A of S with 
P(X e <?A) = we have 

P(X„ eA)^P(XeA) (1.7) 

as n — > oo. (Here dA is the boundary of the set A.) The assumption of boundedness on / may seem excessive, but in 
the presence of uniform bounds on the moments of X„ and X, convergence in distribution implies that ( 11.61 ) holds for 
more general continuous / 14J, p. 86]. 

It is not obvious from either of the above definitions of weak convergence how to measure the speed with which a 
sequence X„ converges in distribution to X, since the rate at which limits dl.6t and dl.7t occur depends on / and 
A respectively. The Prokhorov metric is one way to define the distance between the distributions of two random 
elements, and thus allows us to quantify convergence in distribution. For any two random elements X and Y of S 
let p(X,Y) be the Prokhorov distance between them (see Section 2 for the definition). This distance is zero if and 
only if X and Y have the same distribution, that is if ¥(X G A) = P(Y E A) for all Borel sets A. Moreover, if X n is a 
sequence of random elements in a separable metric space S, p (X„,X) — » if and only if X n converges in distribution 
to X. Thus we say that the Prokhorov metric metrizes convergence in distribution. 

In our case, we view the solution of the system of SDEs at time T as random vector (a random element of R"), 
and likewise for the numerical solution at time T. Then we ask how the Prokhorov distance between X(T) and 
Xx/At depends on At. Our main result in Section 3 shows that the usual definition of weak convergence in terms 
of test functions implies convergence in the Prokhorov metric, and we provide a bound on the rate. One important 
component of our proof is determining how exactly the constant Cj in ( 11.4b depends on / in the usual proofs of weak 



convergence 111 U |9|] . 

In Section 4 we show one consequence of our main result concerning re-embedding trajectories of the SDEs and of 
the numerical method in a new probability space. Two random vectors (such as X(T) and Xr/A*) mav either not be 
close to each other on a realization-by-realization basis, or may be defined on completely different probability spaces. 
However, it is possible to define new random vectors Y and Z jointly on a new probability space such that Y has the 
same distribution as X(T) and Z has the same distribution as X T ^. This construction is called a re-embedding of 
X(T) andXy/^, in a new probability space. After re-embedding the random vectors may be close together in a strong 
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sense and we can look at how quantities like K\Y — Z| or P(|F — Z > a) behave as At varies. The Strassen-Dudley 
theorem says that if two random variables are close in the Prokhorov metric, then there is a re-embedding of them 
into another probability space for which they are close in probability. A bound on some higher moment of Y and Z 
then gives that K\Y — Z| is small. Using our bound in the Prokhorov metric and the Strassen-Dudley theorem, we 
show that a method with the usual weak convergence of order p converges strongly after re-embedding with order 
— £ for any e > 0. This is equivalent to proving a rate of convergence in the Wasserstein distance (see Section 
4 for a definition). We also use re-embedding to establish rates for the convergence of expectations of test functions 
requiring only local Lipschitz continuity and polynomial growth. 

Finally, in Section 5, we discuss the corresponding result for weak convergence of entire numerical trajectories 
on [0, T) to exact trajectories of the original system. Convergence in distribution follows directly from a result of 
Stroock and Varadhan, which we review. However, to the best of our knowledge there is no bound available for 
this rate of convergence for general weakly convergent methods. (Though see IU5U 8I1 for some results for the strong 
Euler-Maruyama method.) Applying Skorohod's theorem gives the corresponding strong convergence result for the 
trajectories embedded in another probability space, though again without a rate. 



2 Metrics on Spaces of Random Elements 

Consider a metric space (S,d) with metric d. A random element of S is a measurable function X : £2 — > S where 
(£2, J?,P) is some probability space. For example, if S is R" with the metric d(x,y) = \x — y\, then random elements 
X are called random vectors. Even if two random elements X and Y of a metric space S are not close on a realization- 
by-realization basis, we may still wish to compare their distributions. So we define a metric on the space of random 
elements of S. Note that there are two distinct metrics involved: d which is a metric on the original space S and 
another which is a metric on the space of random elements of S. In this section, we first define the well-known 
Prokhorov metric p, which is defined for any underlying metric space. Then we introduce the metrics j3/ for non- 
negative integers /, when the underlying metric space is R". The latter are similar to the metric /3 of Fortet and 
Mourier |7|: see |3. Sec. 11.3]. 

For a set A C S we define A £ , e > 0, the set of all points within distance e of A by 

A £ = {xeS\ Md(x,y) <£}. (2.1) 

The Prokhorov metric is defined as follows. 
Definition 1 For random variables X and Y in S 

p(X,Y) :=inf{e | P(X e A) < P(Y e A £ ) + e, for all A closed}. 

If we identify random elements of S that have the same distribution, then p is a metric on the set of random elements 
Hp. 394]. If (S,d) is separable (as are all examples in this paper) random elements X„ converge in distribution to X 
if and only if p(X„,X) -> [3, p. 395]. Note that p(X,Y) < 1 always. 

Here is the Strassen-Dudley Theorem as proven in J^UH], used later in this section and in Sectional 

THEOREM 2.1 (|2j, p. 73]) Let (S,d) be a separable metric space. If X and X are random elements of S with 
p(X,X) < a, then there are random elements Y and Z of S defined on a common probability space such that Y has 
the same distribution as X, Z has the same distribution as X and 

F(\Y-Z\ >a)<a. 

□ 

We now define a class of metrics j3/ on random vectors, that is, random elements of the metric space (R", • |). Let 
/: M. k — > R. Let a be a vector of length k with non-negative integer components. Let \a \ := a, and 
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If we wish to emphasize the argument of / in our notation we use D" instead of D a . For I > and / : R" — > R let 

||/|| ; := £ sup \D a f{x)\. (2.2) 

|a|</-^K" 

Definition 2 For random vectors X and Y in R" and for / > we let 

P l (X,Y)= sup |E/(X)-E/(7)| 
ll/ll/<i 

It is straightforward to check that j3/ is a metric on the space of random variables. 

The following theorem is the main result in this section, and allows us to show in the next section that solutions 
generated by weak numerical methods converge in the Prokhorov metric. 

THEOREM 2.2 For each / > there is a constant C > such that for any random vectors X, Y in R", 

p(X,Y)<Ch&,Y)W +t >. 

Proof. Here we closely follow J2I p. 396]. Consider any closed set K in W and e e (0, 1]. Let K £ be defined as 
in Equation ( 12. It . The lemma following this theorem shows that there is a smooth function / and a constant C such 
that depends on n but not on £ or K such that 

Itfto < /to < U° to and ll/H, < Ce~ l . 

Without loss of generality we assume that C > 1 . We now use the function / to establish the required bound. For 
any random variables X and Y 

P{Y E K) < Ef(Y) 

<E/(X) + ||/||,j3,(X,y) 
<F(X eK e )+Ce~ l Pi(X,Y). 

So for any e e (0, 1] 

p{X,Y) <max(e,Ce _/ /3,(X,y)). (2.3) 

Now if Pi(X,Y) > l,sinceC> 1 andp(X,Y) < 1 the result is immediately true. So we assume that j8,(Z,F) < 1 and 
choose e so that j3/(X,T) = e /+1 . Then£< 1 and gj) gives us p(X,y) <£ max (1,C). Sop(X,y) <Cj3,(X,y) 1 /( 1 +0 
as required. □ 

Lemma 2. 1 For each closed set TiT C R" there is a parametrized family of functions f e (x) for £ G (0, 1] such that 

l*W</ e to<M*), ( 2 - 4 ) 
and there is a constant C depending on n but not on £, Zf, or / such that 

||/ e ||/<Ce- z . (2.5) 



Proof. We use the method of mollifiers; see, for example [6, p. 629]. Define 77 : R" — > R by 

77 (x) := 



Dexp(^- T ) if W<1 
if IWI>1, 



where D > is selected so that J R „ rj (x)dx = 1 . The mollifier T] e C°° is positive with support in the unit ball about 
the origin. Define 



e" \e 

This function is in C°°, has support on the ball of radius £ about the origin, and also has integral 1. 
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Let K' be the closure of K £ l 2 and let 

fe(x):= rj E/2 (y)l K ,(x-y)dy. 

The function f e is 1 on K, on M." \ K e , and between zero and one elsewhere. So f e satisfies the condition of 

Equation d2.41 >. 

In |0, p. 630] it is shown that 

D a f £ (x)= [ D a x l] £l2 (x-y)\ Kl {y)dy. 

Jw 

So 

\D a f e (x)\< [ \D«r le/2 (x-y)\dy= [ (e/2)->'\D«i 1 (2(x- y )/e)\d y 
Jr" Jr" 

= (e/2)-l«l [ \D a z T){z)\dz, 



where we have used the change of variables z — 2(y — x)/e, Df = (— e/2)l a lD™. The integral in the last expression 
is finite and does not depend on K or e. Summing over all a with \a\ < I gives us 

11/11/ < L C a e-W<Ce-' 

\a\<l 

for some constants C a , C, for all £ £ (0, 1 ] . □ 
For completeness we include the following theorem which together with Theorem l2. 21 shows that the metrics p and 
j3; induce the same topology on the space of random elements of M". Thus, as for p, j3/(X„,X) — > if and only if X„ 
converges to X in distribution. This result is analogous to Theorem 1 1.6.5 in 

Theorem 2.3 For all / > 1, and random X andX in R", the metrics p and /3/ satisfy 

P,(X,X)<2p(X,X). 

Proof. Let p (X,X) = e. Using Theorem l2.ll let Y and Z be random vectors on the same probability space with the 
same distributions as X and X respectively such that P(|F — Z\ > e) < e. Then 

Pl(X,X)< sup E|/(7)-/(Z)| 

ll/ll/<i 

= sup {E[i| F _ Z | >e |/(y)-/(z)|]+E[i|y_ Z |< e |/(y)-/(z)|]} 

n/ii/<i 

< sup {:; [l|y- Z | >e ] 2 sup \f(x) I + 6 max ~ 1 

ii/ii,<i I * z \\y-A\ 

< sup J2esup|/(x)|+££sup|D'(/(x))|l 

II/II/<1 I i x J 

< 2e, 

as required. □ 



3 Convergence of Numerical Methods 

Here we prove our result on the convergence in the Prokhorov metric of numerical approximations to exact solutions 
of SDEs. We consider the system of Ito SDEs 



9 

dX = a{X)dt + £ a r (X)dW r (t), 

r=\ 



(3.1) 
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where X(t) £ R", a: W -> R", cj r : R" — > R" x " for all r. The W r , r = 1, . . . ,q are mutually independent standard 
Wiener processes. We set the initial condition to be X(0) = xq £ R". 

To prove our convergence theorem we build on a weak convergence result from [11]. This result is expressed for a 
rather general method for the system ( 13. It : 

X k+1 =X k + B(X k ,At-4 k ), (3.2) 

with Xq — xq. Here 8 is vector-valued function and % k , k = 0, 1,. . . is a sequence of independent random vectors. 
Usually we suppress the % k from the notation and view 8(X k , At) as a random vector. We denote its ith component by 
8i(X k , At ). Here X k is intended to be an approximation to X(kAt ). In the following we use 8 to denote the increment 
of the true solution over a time interval: for the solution X to Equation ( 13. U with X(0) = x, set 

8(x,At) =X(At)-X(0). 

Thus 8(x, At ), like 5(x, Af ), is a random vector. The ith component of 8(x,At) is denoted Af). 

Theorem 13.11 below gives a rate of convergence of Ef(X k ) to E/(X(fcA/)) in which the dependence of the constant 
on / is given. This result is an corollary of the result of 111 lL p. 100] or ||9l p. 473] in which the dependence of the 
constant on / is not made explicit. Here, by making stronger assumptions on the coefficients a and O r , we show that 
the constant is linear in ||/||2p+2 where p is the order of the method. (See Section 2 for a definition of || • 1 1 2»+2-) 

Theorem 3.1 Let T > be fixed. Suppose that 

(a) the coefficients a and a r of the system of SDEs d3.lt have globally Lipschitz derivatives up to and including order 

2p + 2; 

(b) there is some scalar function K(x) with at most polynomial growth as x — ► °° such that 



<K{x)At p+l , 



for 5 = 1,... ,2/5+1 and 



2/3+2 

E Y\ \8ij(x,At)\ <K{x)At p+x ; 
7=1 



(c) for all m > 1 the expectations El^l 2 " 1 exist and are uniformly bounded with respect to At and k = 0,l, . . . , \ T /At\ . 

(d) the function f(x) together with its partial derivatives of order up to and including 2p + 2 are bounded. Then for 
all kAt € [0, T] 

\Ef(X(kAt))-Ef(X k )\ < C\\f\\ 2p+ 2At". 
The constant C depends on x, a, a, and T but not on / and At. 

Proof. We define Y(x,t) toheX(kAt) where X is the solution of (13. It with initial condition X(t) =x,t <kAt. Define 
the function u(x,t) by 

u(x,t):=Ef(Y(x,t)). 
If follows from the proof of Theorem 2.1 in 111 U p. 100] that 

\Ef{X(kAt))-Ef{X k )\ <AAt p max ||»(-,f)l|2p+2 

t£[0,k&t] 

for some constant A not depending on /. 

In ifiol p. 223] it is shown that if the coefficients of a and o> of the system of SDEs d3.1| ) have globally Lipschitz 
continuous derivatives up to order 2p + 2 (condition (a)) then Y(t,x) has continuous derivatives with respect to x up 
to order 2p + 2, almost surely. 

Let <9, denote differentiation of a function with respect to its ith argument. Formally, we can differentiate u with 
respect to x to obtain 

d l u = Yj.[(dJ{Y)){d i Y a )}, 

a 

didju = £E [(d a d b f(Y))(diY a )(djY b )] +£E [(d a f(Y))(didjY a )] , 

a.b a 
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and so forth, using the product and chain rules. To justify the formal differentiations, we need only observe that all 
multi-derivatives of / up to order 2p + 2 are bounded, and remark that it follows from [ 10] that all moments of the 
derivatives of Y up to order 2p + 2 are finite. The exchange of differentiation with expectation then follows in each 
case by Fubini's theorem [ 17, p. 222]. Applying the Cauchy-Schwarz inequality to each term gives 

1/2 



sup|D« M |<£{E(zyV) 2 } Eft, 



where j3, are a sequence of multi-indices with |j3,| < \a\ and Ep. are some constants independent of /. So 

sup \D a u\< F a \\f\\\ a \, 

xeR 

for some constants F a independent of /. Summing this inequality over all a with \ a\ <2p + 2 gives us the result. □ 
Putting Theorems [272] and [XT] together gives us our conclusion for this section. 

Corollary 3.1 Let conditions (a), (b), (c) of Theorem |3.1| be satisfied. Then for some constant K 

p{X{kAt),X k )<KAt p ^ 2p+y \ 

for all kAt<T. 

Proof. By Theorem l3.1l and the definition of j3/, we have that 

P 2p+2 {X(kAt),X k ) < CAtP. 

Applying Theorem l2.2l with I =2p + 2 then gives the result. □ 



4 Strong Convergence 

In this section, we apply the Strassen-Dudley theorem to show that, after being re-embedded in another probability 
space, weakly converging methods for stochastic differential equations converge strongly with a reduced order. This 
re-embedding immediately gives a rate of convergence in the Wasserstein distance. As a corollary, we establish a rate 
of convergence of Kf(X T ^) to E,f(X(T)) that requires only that / is locally Lipschitz with a polynomial growth 
condition. 

THEOREM 4.1 Let conditions (a), (b), (c) of Theorem l3.1l be satisfied. There is a probability space on which random 
vectors Y and Z are defined such that Y has the same distribution as X(T) and Z has the same distribution as X T ^ 
and, for any e > 

E|y-Z| <CAt>^~ £ : 
for some constant C, for all sufficiently small At. 

Proof. Let a = KAt p ^ 2p+3 ' > where K is as in Corollary |3~T1 Theorem [XT] together with Corollary [XJJestablish the 
existence of the random vectors Y and Z with the correct distributions such that 

E(l|y_2|>a) = P(|r-Z| >a)<a. 

Choose e > 0. Now, the conditions of Theorem 13.11 ensure that both Y and Z and hence \Y — Z\ have finite moments 
of all orders independent of At. Choose real numbers q\ , q 2 > 1 such that l/qi + 1 jq 2 = 1 and ^p+i) qi — (2p+3) ~ e ' 
Using Holder's inequality, we obtain 

E\Y-Z\ = E [\Y -Z\\\ Y _ A>a ) +E [\Y -Z\l\ T _q< a ] 
<(E\Y-ZD l ^(m lY _ zl>a ) l ^ + a 
< (E\Y -Z| £ ") V«i a l/ci2 + a 
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for all sufficiently small At, as required. □ 

Applying this theorem to the case of weak Euler-Maruyama (see Equation ( 11.5b ) with p = 1 implies a strong rate of 
convergence after re-embedding of 1/5 — £ for any e > 0. 

Now we express our result in terms of the Wasserstein distance, also known as the Wasserstein-1 distance, the 
Monge-Wasserstein distance yl p. 420], or the Kantorovich-Rubinstein distance ip. 206]. To define this metric, 
let X and Y be random elements of S and let M(X,Y) be the set of all probability measures jj. on S x S such that the 
marginals of jj, are the probability measures induced on S by X and Y respectively. Then the Wasserstein distance is 



W(X,Y)= inf E u d(x,y) = inf d{x,y)du{x,y). 
fl€M{X,Y) neM(x,Y)J 

where E^ denotes expectation with respect to the measure ji for (x,y) G S x S. In words, the Wasserstein distance is 
the minimal L 1 distance between X and Y after re-embedding. Therefore Theorem 14 . 1 1 show s the following: 



Corollary 4.1 Let conditions (a), (b), (c) of Theorem|3.1|be satisfied. Then for any e > 

W(X(T),X T/At )<CAt^- £ , 

for some constant C, for sufficiently small At. □ 

As another corollary to Theorem l4.ll we show that E/(Xy/^) — » Ef(X(T)) given some polynomial growth condi- 
tions on /, even when / is only locally Lipschitz. This result is like the usual weak convergence result ifTH p. 100], 
but with a relaxed smoothness requirement on / and a reduced rate. Compare with Mikulevicius and Platen's result 
jH p. 460] or Bally and Talay's result [1], both of which only apply to strong Euler-Maruyama (see Equation ( 11.2b ). 



Corollary 4.2 Let conditions (a), (b), (c) of Theorem l3.1l be satisfied. Let / be locally Lipschitz with 

\f(x)-f(y)\<L R \x-y\, 

whenever \x\ <R and \y\ <R, where 

L R <C(1+R K ), 

for some constants C and K. Then for any e > 



\Ef(X T/ ^)-Ef(X(T))\<KAt^) . 
for some constant K, for sufficiently small At. 

Proof. Let a = KAtP^ 2 P+^ where K is as in Corollary 13.11 Let Y and Z be as in the proof of Theorem 14.11 
so that Y has the same distribution as X(T), Z has the same distribution as X T j^, and P(|T — Z| > a) < a. Let 
M = max(|y|, |Z|). We immediately have 

\Ef(X T/&t )-Ef(X(T))\ - |E/(y)-E/(Z)| <E\f(Y)-f(Z)\. 

Let R be an arbitrary radius which we shall fix later. We can split the quantity of interest into three terms: 

E|/(y)-/(z)|<E|/(y)-/(z)|i, y _ Z | 

<a.M<R 

+E|/(y)-/(z)|i, y _ Z | 

>a,M<R 

+ E\f(Y)-f(Z)\l M>R 

=:7i+r 2 +r 3 . 

To bound the first term, note that 

T\ <L R a<aC{\+R K ). 
To bound the second and third terms, note that 

l/to I < 1/(0)1 + |/W-/(o) I 

< 1/(0)1 +L w |x| 
<D(l + |x|- +1 ), 
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for all x for some constant D. Then 

72<E{|/(y)| + |/(Z)|}l, y _ Z | 

>a,M<R 

< E{D(l + \Y\ K+l ) + D(1 + \Z\ K+l )} l\ Y - Z \>aM<R 

< E2D(1 +R K+1 )l\ Y -z\>a < 2D(1 +fl' c+1 )a. 

To bound the third term, we use the fact that all moments Y and Z are finite and the bounds on the moments of Z are 
independent of At. For any exponent q > K+ 1 (which we shall choose later), we let m q denote this bound so that 
E\Y\ q < m g and E\Z\ q < nig. These inequalities in turn imply that EW < 2m c/ . Then 

73<E{|/(y)| + |/(z)|}w 

<E2D{l+M K+l )l M>R 

(1 +M K+l ) 

- 2DE^— } -M q \ M >R 

Mi 

1 1 i n K+ 1 \ 

< 2D^— >-2m q < C q R K+l - q , 

- R q 1-1 

for all sufficiently large R. Putting these three bounds together gives 

E\f(Y) -f(Z)\ < aER K+l +C q R K+x -\ 
for sufficiently large R. We get to choose both R and q. For any q > K, if we choose R = a~ l l q we obtain 

E|/(K)-/(Z)| <Ea l - {K+l) l q + C q a l - {K+l) l q . 
Choosing q large enough gives the desired result. □ 



5 Pathwise Convergence in Distribution 

The results in previous sections concern the pointwise weak convergence of numerical methods for SDEs, that is, 
convergence at each point in time t . A stronger result is that entire trajectories generated by the numerical method 
weakly converge to those of the system of SDEs. Stroock and Varadhan prove pathwise convergence in distribution 
of numerical methods in great generality in |fl3ll but they do not provide a rate. Here we review their result and apply a 
re-embedding theorem to establish the corresponding strong result for embedded random paths. Since no rate appears 
to be established for Stroock and Varadhan's result, we do not phrase results in terms of the Prokhorov metric and 
instead just consider convergence in distribution. Moreover, we can use Skorohod's theorem for re-embedding rather 
than the Strassen-Dudley theorem. The latter gives precise rates of strong convergence but the former allows one to 
construct a whole sequence of random paths and their limit on one probability space. 

First we review the definition of convergence in distribution in C" [0, T], the space of continuous, R"-valued functions 
on [0, T], [2]. For any fixed T and initial condition xq the solution to the system of SDEs (13. Il l gives a random element 
of C" [0, T] which we denote by X. For the same T and initial conditions the numerical method with step-size At gives 
a sequence X^, k = 0,1, . . .. We define the linear interpolant X&, of the values X^ by 

Xto(t) =^[;/A(J + (t/At- [t/At\)(X\,/ At ^ + l -X|_,/Afj), 

for t S [0, T\. Thus X&, is a random element of C"[0, T]. If we equip C"[0, T] with the norm || • ||«, we obtain a metric 
space with metric 

d(x,y):=\\x-y\\ a> = sup \x(t)-y(t)\, 
te[0,T] 

forx,y G C"[Q, T], We say thatX^t converges in distribution to X if for all bounded continuous functions / : C"[0, T] — » 
R 

Ef(X At )^Ef{X), 
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as At —>■ 0. Stroock and Varadhan's result gives conditions on the original system of SDEs and the numerical method 
under which X At converges in distribution to X. 

The system of SDEs ( 13. Il l we consider is determined by its coefficients a and a, . We define the matrix b from a r by 

b ij 0) = L ° ri M a nj (*) > ( 5 ■ 1 ) 

r 

for z, j — 1, . . . ,n. Recall that the increment of the numerical method (13.21 i starting from x is denoted 8(x,At), For 
our numerical method we define corresponding coefficients a At and b At by 



and 



Finally, we define by 



b*u(x) = -^E8i(x,At)Sj(x,At)l ]B{xM] < v 



r% t (x) = ^¥(\B(x,At)\>e) 



In the following let || • || denote any norm on the space of n x n matrices. 
Theorem 5.1 Suppose that 

(a) the coefficients a and ay of the SDEs (13. Il l are locally Lipschitz continuous; 

(b) there is a constant C such that for all igl" 



x 1 a{x) < C(l + |x| z ) and [[&(*) || < C(l + |jc| 2 ); 



(c) for all > 



and 



lim sup \ciAt(x) — a(x)\ — 0, lim sup H^atM — b(x)\\ = 0, 

' " At ^°\x\<R 



lim supr^W =0, 

Af-*0 



:|<ff 

for all e > 0. 

Then X At converges in distribution to X in C" [0,T]. 

Proof. This result is Theorem 1 1.2.3 of lfl3ll . using Theorems 5.3.1 and 5.3.2 of to obtain the well-posedness of 
the martingale problem. □ 

We remark that condition (b) of Theorem 15. H is stronger than necessary. The result holds with (b) replaced by the 
weaker condition that the martingale problem with coefficients a and b has a unique solution. 

To give a feeling for the power of this result, here are some examples of functions / for which it applies. Firstly, we 
can recover the simpler pointwise results (without rates) if we let / : C" [0, T] — » M be defined by f(X) := g(X(t)), 
for some time t € [0,T]. If g: M — > R is continuous and bounded then / is continuous and bounded and the previous 
theorem tells us that W,g(X& r (t)) — > Eg(X(t)). More generally, we can choose / to depend on X through a number of 
points ti,...,tk S [0,r]. Suppose g: R* — > R is bounded and continuous. Letting f(X) := g(X(t\), . . . ,X(^)) gives 
us 

EgiXxih), . . . ,X At (t k )) -f Eg{X(h),. . . ,X(t k )), 

as At — > 0. Even more generally, we can look at functions that do not depend on any finite number of times f,-. For 
example, let g: R — > R be bounded and continuous and define f(X) := g(max fe [ T ^X(t)). The previous theorem 
tells us that Ef(X At ) -> E/(X) as Af -> 0. 

These example all rely on / being continuous and bounded. However, these assumptions can be weakened consid- 
erably in some cases. One result of this type is that if / is bounded and measurable and the probability of X falling 
in the set of discontinuities of / is zero, then it still holds that E/(1 A ,) -> Ef(X) H p. 21]. As an example, let 
f{X) = lx(t l )eA l ^x(t 2 )eA2- Then if we can show P(X(*i) G dAi,X(t 2 ) G dA 2 ) = 0, it follows that 

P$4»(«i) 6Ai,5A»(<a) eA 2 )^P(x(ti) eA u x(t 2 ) gA 2 ) 
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as At —>■ 0. Similarly, the condition on the boundedness of / can be relaxed given some a priori knowledge on the 
distributions of X„ and X yj, p. 31]. 

We now put the conditions of Theorem l5.1l in terms more familiar in the numerical analysis of SDEs. 

Theorem 5.2 Suppose the coefficients a and a of ( 13.11 ) are locally Lipschitz continuous and condition (b) of 
Theorem l5.1l is satisfied. Suppose that the following limits hold as At — > uniformly on bounded subsets of M": 



At 

1 



' 'E5,--E5,-| -^0, (5.2) 



At 

and 



E8i8j-E8i8j\-^0, (5.3) 

-J-E|5i5 ; 4|^0, (5.4) 
At 

for all i,j,k = 1 , . . . , n. Then converges in distribution to X in C [0,T]. 

Proof. We prove this result by showing that 15.21 . ( 15.3b . and ( 15.4b imply condition (c) of Theorem 15.11 In the 
following we suppress the arguments of 8(x,At) and 8(x,At). Fix a bounded set in W. We start by proving that ( 15.41 > 
implies r^(jc) — > uniformly. Using Chebyshev's inequality: 

= ^F(|a(x,Ar) > e) < £ lp(|5,| > e/n) = £ lE|§| 3 (n/e) 3 , 

which goes to zero as Af — > 0. 

Next we prove that ||&ajW — g° es to zero uniformly as Af — > 0. The argument for |a/^(x) — is analogous 
so we omit it. Note that it suffices to show that |£a*, ij — ^ul — * uniformly on the chosen set for all z',j. For each i 
and; 

l^-Ay| = |^EW,«,<i-M 

< hp.8i8 } l m ESiBjl + i |Ea,S, - + I ^EStSj - b tj \ 

The second term goes to zero from ( 15.3b . The third term goes to zero by general properties of SDEs. The first term 
is equal to 

which goes to zero by ( 15.41 i. □ 
In order to show a strong re-embedding type of result, we use the following theorem, sometimes called Skorohod's 
Theorem. 

Theorem 5.3 (See JH p. 70].) Let S be a separable metric space with metric d. Suppose that X„, n > 1 and X are 
random variables taking values in S, and thatX,, converges in distribution to X. Then there are random variables Y n , 
n > 1 and Y all defined on the same probability space such that the distribution of Y„ is the same as X„ for all «, the 
distribution of Y is the same as X, and Y n converges to Y almost surely. 

Applying this theorem to X& t , and X gives the following result. 

THEOREM 5 .4 Let either the conditions of Theorem l5.1l or the conditions of Theorem l5 ,2l hold. Let At n be a sequence 
of positive step-sizes converging to 0. Then there are random elements Y„ and Y of C" [0, T] such that Y„ has the same 
distribution as X^ n , Y has the same distribution as X, and Y„ — > Y in C" [0, T] almost surely. Thus, almost surely, 

lim sup \Y„(t)-Y(t) \ =0. 
"^°°re[o,r] 
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